This notebook provides a comprehensive, code-driven analysis of U.S. unemployment rates by demographic group, leveraging monthly data from the Federal Reserve Economic Data (FRED). The workflow combines time series analysis, statistical modeling, and policy context to uncover trends, disparities, and structural changes in the labor market.
Data Import & Preparation
Exploratory Data Analysis
Time Series Decomposition
Comparative Analysis
Gap and Ratio Analysis
Autocorrelation Analysis
Forecasting
Change Point & Anomaly Detection
Contextual Overlays
A separate docs/Policy_Changes_Unemployment_Summary.txt file is included, providing sources and context for policy events and economic shifts referenced in the analysis and visualizations. This ensures transparency and traceability for all annotated information.
This notebook is designed for researchers, students, and policymakers seeking to understand the evolution and drivers of unemployment disparities in the U.S., with a focus on both statistical rigor and real-world context.
This section imports all necessary libraries and ensures that required packages are installed using a helper function.
import importlib
import sys
import subprocess
def ensure_package(pkg, pip_name=None):
try:
importlib.import_module(pkg)
except ImportError:
pip_pkg = pip_name if pip_name else pkg
subprocess.check_call([sys.executable, "-m", "pip", "install", pip_pkg])
# Check and install required packages
ensure_package("matplotlib")
ensure_package("numpy")
ensure_package("statsmodels")
ensure_package("seaborn")
ensure_package("ruptures")
ensure_package("mplcursors")
ensure_package("pandas")
ensure_package("scikit-learn", "scikit-learn")
# Standard library
import re
# Interactive plotting
import mplcursors
# Plotting and visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Data manipulation and analysis
import pandas as pd
import numpy as np
# Statistical modeling and time series analysis
import statsmodels.api as sm
import statsmodels.tsa.api as tsa
from statsmodels.tsa.stattools import adfuller, acf, pacf
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.nonparametric.smoothers_lowess import lowess
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.api import VAR
# Machine learning
from sklearn.ensemble import IsolationForest
from sklearn.linear_model import LinearRegression
# Change point detection
import ruptures as rpt
# Jupyter display utilities
from IPython.display import display, Markdown
Parse the README.txt file to extract demographic group names for use as column headers in the dataset.
# Read the README.txt file and extract headers for the CSV file
with open('../data/README.txt', 'r') as f:
lines = f.readlines()
clean_series_names = []
for i, line in enumerate(lines):
if line.strip().lower().startswith('title'):
if i + 1 < len(lines):
header_line = lines[i + 1].strip()
if 'Unemployment Rate - ' in header_line:
# Split at 'Unemployment Rate - ' and then at the first double space
part = header_line.split('Unemployment Rate - ', 1)[1]
name = part.split(' ', 1)[0].strip()
clean_series_names.append(name)
elif header_line.startswith('Unemployment Rate'):
# Generic "Unemployment Rate" (no dash)
clean_series_names.append('Unemployment Rate')
print(clean_series_names)
Read the CSV file containing unemployment data, assign new headers, and convert the date column to datetime.
# Read the CSV file (assuming the file path is 'data.csv')
df = pd.read_csv('../data/monthly.csv')
# Change the headers: first column to 'date', rest to clean_series_names
new_headers = ['Date'] + clean_series_names
df.columns = new_headers
df['Date'] = pd.to_datetime(df['Date'])
df.head()
Create DataFrames for overall unemployment and for all groups, removing rows with missing values.
# Create a DataFrame with only the 'Unemployment Rate' column (plus Date)
df_unemployment = df[['Date', 'Unemployment Rate']].dropna()
# Truncate the original df to remove any rows with null values in any column
df_truncated = df.dropna()
# Print the head of each DataFrame
print("Unemployment Rate Only:")
print(df_unemployment.head())
print("\nTruncated Original DataFrame (no nulls):")
print(df_truncated.head())
Plot time series of unemployment rates for each demographic group and their 12-month moving averages.
# Plot time series for each demographic group in df_truncated
plt.figure(figsize=(14, 10))
for col in clean_series_names:
plt.plot(df_truncated['Date'], df_truncated[col], label=col)
plt.xlabel('Date (Monthly)')
plt.ylabel('Unemployment Rate (%)')
plt.title('Unemployment Rate by Demographic Group')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
This chart shows the monthly unemployment rate for each demographic group over time. Each line represents a different group, allowing you to visually compare trends, spikes, and recoveries across populations.
# Rolling averages (12-month moving average)
plt.figure(figsize=(14, 10))
for col in clean_series_names:
plt.plot(df_truncated['Date'], df_truncated[col].rolling(window=12, min_periods=1).mean(), label=f"{col} (12mo MA)")
plt.xlabel('Date (Monthly)')
plt.ylabel('Unemployment Rate (12mo MA)')
plt.title('12-Month Moving Average of Unemployment Rate by Demographic Group')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
This chart displays the 12-month moving average for each group's unemployment rate. The moving average smooths out short-term fluctuations, highlighting longer-term trends and cycles in unemployment for each demographic.
How is the moving average calculated?
For each month, the average unemployment rate is computed over the current and previous 11 months. This helps reveal underlying trends by reducing the impact of random month-to-month changes.
Perform STL decomposition on selected demographic groups and plot the observed, trend, seasonal, and residual components.
# STL decomposition for Black or African American and Hispanic or Latino groups (standard 4-panel style)
selected_groups = ['Black or African American', 'Hispanic or Latino']
for group in selected_groups:
series = pd.Series(df_truncated[group].values, index=df_truncated['Date'])
stl = sm.tsa.STL(series, period=12, robust=True)
result = stl.fit()
fig, axes = plt.subplots(4, 1, figsize=(14, 10), sharex=True)
axes[0].plot(series.index, series.values, label='Observed', color='tab:blue')
axes[0].set_ylabel('Observed')
axes[0].legend(loc='upper right')
axes[1].plot(series.index, result.trend, label='Trend', color='tab:orange')
axes[1].set_ylabel('Trend')
axes[1].legend(loc='upper right')
axes[2].plot(series.index, result.seasonal, label='Seasonal', color='tab:green')
axes[2].set_ylabel('Seasonal')
axes[2].legend(loc='upper right')
axes[3].plot(series.index, result.resid, label='Residual', color='tab:red')
axes[3].set_ylabel('Residual')
axes[3].legend(loc='upper right')
axes[3].set_xlabel('Date')
fig.suptitle(f'STL Decomposition: {group}', fontsize=15)
plt.tight_layout(rect=[0, 0, 1, 0.97])
plt.show()
For each selected demographic group, the STL (Seasonal-Trend decomposition using Loess) chart is shown in a standard 4-panel layout:
Observed:
The top panel displays the original unemployment rate time series for the group. This shows the raw monthly data, including all fluctuations.
Trend:
The second panel shows the long-term trend component extracted from the data. This highlights the underlying direction of unemployment over time, smoothing out short-term variations.
Seasonal:
The third panel presents the seasonal component, capturing regular, repeating patterns within each year (such as seasonal employment cycles).
Residual:
The bottom panel displays the residual (remainder) component, representing irregular fluctuations not explained by the trend or seasonality. These may correspond to unexpected shocks or noise.
Interpretation:
STL decomposition is a powerful tool for understanding the structure of time series data and for identifying both systematic and irregular changes in unemployment rates across demographic groups.
Plot the overall unemployment rate alongside each demographic group's rate for direct comparison.
# Plot overall unemployment rate vs each demographic group
plt.figure(figsize=(14, 10))
plt.plot(df_truncated['Date'], df_truncated['Unemployment Rate'], label='Overall Unemployment Rate', linewidth=3, color='black')
for col in clean_series_names:
if col != 'Unemployment Rate':
plt.plot(df_truncated['Date'], df_truncated[col], label=col, alpha=0.7)
plt.xlabel('Date (Monthly)')
plt.ylabel('Unemployment Rate (%)')
plt.title('Overall vs Demographic Group Unemployment Rates')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Chart Analysis: Overall vs Demographic Group Unemployment Rates
This chart compares the overall U.S. unemployment rate (black line) with the rates for each demographic group over time. Key observations:
This visualization highlights persistent disparities in unemployment across demographic groups and how these gaps evolve through economic cycles.
Fit and plot linear regression and LOESS smoothed trends for each demographic group to analyze long-term changes.
# Long-term trend comparison using linear regression and LOESS for each group
plt.figure(figsize=(14, 10))
for col in clean_series_names:
if col == 'Unemployment Rate':
continue
# Prepare data
x = np.arange(len(df_truncated))
y = df_truncated[col].values
# Linear Regression
lr = LinearRegression()
lr.fit(x.reshape(-1, 1), y)
y_pred = lr.predict(x.reshape(-1, 1))
# LOESS smoothing
loess_smoothed = lowess(y, x, frac=0.15, return_sorted=False)
# Plot
plt.plot(df_truncated['Date'], y, label=f"{col} (actual)", alpha=0.3)
plt.plot(df_truncated['Date'], y_pred, label=f"{col} (linear)", linestyle='--')
plt.plot(df_truncated['Date'], loess_smoothed, label=f"{col} (LOESS)", linewidth=2)
plt.xlabel('Date')
plt.ylabel('Unemployment Rate (%)')
plt.title('Long-term Trends in Unemployment Rate by Demographic Group')
plt.legend()
plt.tight_layout()
plt.show()
Trend Comparison Chart:
For each demographic group, the chart shows:
How the models are used:
How to interpret the charts:
Calculate and plot the unemployment rate gap and ratio between each group and the White group.
# Unemployment gap analysis: difference and ratio to White group
reference_group = 'White'
gaps = {}
ratios = {}
for col in clean_series_names:
if col in ['Unemployment Rate', reference_group]:
continue
gaps[col] = df_truncated[col] - df_truncated[reference_group]
ratios[col] = df_truncated[col] / df_truncated[reference_group]
# Plot unemployment gaps
plt.figure(figsize=(14, 10))
for col, gap in gaps.items():
plt.plot(df_truncated['Date'], gap, label=f"{col} - {reference_group}")
plt.axhline(0, color='black', linestyle='--', linewidth=1)
plt.xlabel('Date')
plt.ylabel('Unemployment Rate Gap (%)')
plt.title('Unemployment Rate Gap vs White Group')
plt.legend()
plt.tight_layout()
plt.show()
# Plot unemployment ratios
plt.figure(figsize=(14, 10))
for col, ratio in ratios.items():
plt.plot(df_truncated['Date'], ratio, label=f"{col}/{reference_group}")
plt.axhline(1, color='black', linestyle='--', linewidth=1)
plt.xlabel('Date')
plt.ylabel('Unemployment Rate Ratio')
plt.title('Unemployment Rate Ratio to White Group')
plt.legend()
plt.tight_layout()
plt.show()
Gap and Ratio Charts:
These charts compare each group to the "White" group:
Analysis:
The gap and ratio charts provide a clear visualization of structural disparities in unemployment rates between demographic groups and the White population. Persistent positive gaps and ratios above 1 for groups such as "Black or African American" and "Hispanic or Latino" indicate long-standing disadvantages in the labor market. Conversely, groups with gaps below zero or ratios below 1, such as "Asian," often experience lower unemployment rates than the White group. The temporal evolution of these metrics also reveals how economic shocks (e.g., recessions, pandemics) can widen or narrow these disparities.
Next Steps to Enhance the Research:
By extending the analysis in these directions, the research can provide deeper insights into the causes and consequences of labor market inequality, and inform more targeted policy interventions.
Generate and plot ACF and PACF for a selected group's unemployment rate to analyze autocorrelation structure.
# ACF and PACF plots for the unemployment rate of Black or African American group
group = 'Black or African American'
series = pd.Series(df_truncated[group].values, index=df_truncated['Date'])
fig, axes = plt.subplots(1, 2, figsize=(14, 10))
plot_acf(series, lags=40, ax=axes[0], title=f'ACF of {group} Unemployment Rate')
axes[0].set_xlabel('Lag')
axes[0].set_ylabel('Autocorrelation')
plot_pacf(series, lags=40, ax=axes[1], title=f'PACF of {group} Unemployment Rate')
axes[1].set_xlabel('Lag')
axes[1].set_ylabel('Partial Autocorrelation')
plt.tight_layout()
plt.show()
The ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) plots for the unemployment rate of the Black or African American group provide insights into the temporal dependencies in the time series data.
ACF Plot:
PACF Plot:
Interpretation:
Usage:
Explanation of ACF and PACF Chart Axes:
X-axis (Lag):
The horizontal axis on both the ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) charts represents the lag, which is the number of periods (months) by which the unemployment rate series is shifted to compute correlations. For example, lag 1 compares each value to the value one month earlier, lag 2 compares to two months earlier, and so on.
Y-axis (Autocorrelation / Partial Autocorrelation):
The vertical axis shows the correlation coefficient:
What the Charts Reveal About the Data:
ACF Chart:
If the bars are high at low lags and gradually decrease, it indicates that the unemployment rate is highly correlated with its recent past values—showing persistence or memory in the series. Significant spikes at certain lags may indicate seasonality or cycles.
PACF Chart:
The PACF helps identify the direct effect of a specific lag, controlling for the influence of shorter lags. A significant spike at lag 1 and then a sharp drop suggests an autoregressive process of order 1 (AR(1)), meaning the current value is mostly explained by the previous value.
For the Black or African American unemployment rate:
Summary:
These charts show that the unemployment rate for this group is not random but depends on its own past values, which is typical for economic time series. The patterns in the ACF and PACF help guide model selection for forecasting and understanding the underlying dynamics.
Fit SARIMA to the overall unemployment rate and VAR to selected groups, then plot forecasts.
# --- SARIMA Forecast: Overall Unemployment Rate ---
series_unemp = df_unemployment.set_index('Date')['Unemployment Rate']
series_unemp.index = pd.to_datetime(series_unemp.index)
series_unemp = series_unemp.asfreq('MS')
# Fit SARIMA model (order can be tuned as needed)
sarima_model = SARIMAX(series_unemp, order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
sarima_result = sarima_model.fit(disp=False)
forecast_periods = 24
sarima_forecast = sarima_result.get_forecast(steps=forecast_periods)
forecast_index = pd.date_range(series_unemp.index[-1] + pd.offsets.MonthBegin(1), periods=forecast_periods, freq='MS')
forecast_mean = sarima_forecast.predicted_mean
forecast_ci = sarima_forecast.conf_int(alpha=0.05)
forecast_ci.index = forecast_index
forecast_mean.index = forecast_index
plt.figure(figsize=(14, 10))
plt.plot(series_unemp, label='Observed')
plt.plot(forecast_mean.index, forecast_mean.values, label='SARIMA Forecast', color='red')
plt.fill_between(forecast_ci.index, forecast_ci.iloc[:, 0], forecast_ci.iloc[:, 1], color='pink', alpha=0.3)
plt.title('SARIMA Forecast: Overall Unemployment Rate')
plt.xlabel('Date')
plt.ylabel('Unemployment Rate (%)')
plt.legend()
plt.tight_layout()
plt.show()
# --- VAR Forecast: Unemployment Rate by Group ---
var_data = df_truncated.set_index('Date')[
['Asian', 'White', 'Black or African American', 'Hispanic or Latino', 'Foreign Born', 'Native Born']
]
var_data.index = pd.to_datetime(var_data.index)
var_data = var_data.asfreq('MS')
var_model = VAR(var_data)
var_result = var_model.fit(maxlags=12, ic='aic')
var_forecast_periods = 12
var_forecast = var_result.forecast(var_data.values[-var_result.k_ar:], steps=var_forecast_periods)
var_forecast_index = pd.date_range(var_data.index[-1] + pd.offsets.MonthBegin(1), periods=var_forecast_periods, freq='MS')
var_forecast_df = pd.DataFrame(var_forecast, index=var_forecast_index, columns=var_data.columns)
plt.figure(figsize=(14, 10))
for col in var_data.columns:
plt.plot(var_data.index, var_data[col], label=f"{col} (actual)")
plt.plot(var_forecast_df.index, var_forecast_df[col], '--', label=f"{col} (VAR forecast)")
plt.title('VAR Forecast: Unemployment Rate by Group')
plt.xlabel('Date')
plt.ylabel('Unemployment Rate (%)')
plt.legend()
plt.tight_layout()
plt.show()
Detect and plot structural breaks in the overall unemployment rate using the PELT algorithm.
# --- Change Point Detection: Overall Unemployment Rate ---
penalty_value = 10 # You can adjust this value as needed
algo = rpt.Pelt(model='rbf').fit(series.values)
result = algo.predict(pen=penalty_value)
plt.figure(figsize=(14, 10))
plt.plot(series.index, series.values, label='Unemployment Rate')
for cp in result[:-1]: # exclude the last point (end of series)
date = series.index[cp]
value = series.iloc[cp]
plt.axvline(date, color='red', linestyle='--', alpha=0.7)
plt.text(
date, value + 0.3, # slightly above the point
f"{date.strftime('%Y-%m')}\n{value:.1f}%",
color='red', fontsize=9, rotation=90, va='bottom', ha='center', backgroundcolor='white'
)
plt.axvline(series.index[cp], color='red', linestyle='--', alpha=0.7)
plt.title('Change Point Detection: Overall Unemployment Rate')
plt.xlabel('Date')
plt.ylabel('Unemployment Rate (%)')
plt.legend()
plt.tight_layout()
plt.show()
Change Point Detection: Overall Unemployment Rate
Red dashed lines indicate detected structural breaks in the unemployment rate time series.
Interpretation:
Change point analysis reveals that the unemployment rate does not evolve smoothly, but rather in distinct regimes separated by economic shocks. Recognizing these breaks is crucial for understanding labor market dynamics and for improving the accuracy of forecasting models.
Annotate detected change points with policy headlines from a summary file for context.
Reducing the penalty_value from 10 to 5 in the PELT change point detection algorithm makes the model more sensitive to smaller shifts in the unemployment rate time series. A lower penalty decreases the cost of adding additional change points, allowing the algorithm to identify not only major structural breaks but also more subtle or short-term changes in the data. This adjustment is useful when the goal is to capture a greater number of regime shifts, including those that may correspond to less dramatic but still meaningful economic events or policy changes. However, it also increases the risk of detecting noise as change points, so the penalty should be chosen carefully based on the analysis objectives.
# Reduce the penalty parameter (e.g., from 10 to 5, or lower if previously higher)
penalty_value = 5 # You can adjust this value as needed
# Run change point detection again with reduced penalty
algo = rpt.Pelt(model='rbf').fit(series.values)
result = algo.predict(pen=penalty_value)
# Plot change point detection with labels for each change point (date and percentage)
plt.figure(figsize=(14, 10))
plt.plot(series.index, series.values, label='Unemployment Rate', color='blue')
for cp in result[:-1]: # exclude the last point (end of series)
date = series.index[cp]
value = series.iloc[cp]
plt.axvline(date, color='red', linestyle='--', alpha=0.7)
plt.text(
date, value + 0.3, # slightly above the point
f"{date.strftime('%Y-%m')}\n{value:.1f}%",
color='red', fontsize=9, rotation=90, va='bottom', ha='center', backgroundcolor='white'
)
plt.title('Change Point Detection: Overall Unemployment Rate')
plt.xlabel('Date')
plt.ylabel('Unemployment Rate (%)')
plt.legend()
plt.tight_layout()
plt.show()
What does this chart show?
This chart displays the U.S. unemployment rate over time, with red dashed lines and labels indicating detected change points. Each label shows the date and unemployment rate at the change point, helping you research major economic events or policy changes that may have caused these shifts.
How are the change points calculated?
Change points are detected using the PELT (Pruned Exact Linear Time) algorithm from the ruptures library, with the "rbf" (radial basis function) cost model. This method identifies points in the time series where the mean and/or variance changes significantly, indicating a structural break. The algorithm works by minimizing a cost function that balances the fit of the model with the number of change points, using a penalty parameter to avoid overfitting. The detected change points highlight moments where the statistical properties of the unemployment rate shifted, often due to economic shocks or policy interventions.
Change in Model: Reduced Penalty for Change Point Detection
The change point detection model was re-run with a lower penalty value (pen={penalty_value}), making the algorithm more sensitive to changes in the unemployment rate time series.
Interpretation:
Compare the new chart to the previous one to see if more change points are detected. If so, these may correspond to less dramatic but still meaningful changes in the unemployment rate, or they may reflect short-term fluctuations rather than major economic events.
# Read date-headline pairs from the summary file (format: "Month YYYY – Headline")
headline_map = {}
with open("../docs/Policy_Changes_Unemployment_Summary.txt", "r", encoding="utf-8") as f:
for line in f:
line = line.strip()
if not line or "–" not in line:
continue
date_part, headline = line.split("–", 1)
date_part = date_part.strip()
headline = headline.strip()
try:
# Parse date as "Month YYYY"
date_key = pd.to_datetime(date_part).to_period("M").to_timestamp()
headline_map[date_key] = headline
except Exception:
continue
# Plot change point detection with horizontal, non-overlapping labels including headlines and unemployment rate
plt.figure(figsize=(14, 10))
plt.plot(series.index, series.values, label='Unemployment Rate', color='blue')
label_positions = []
min_vgap = 3 # Minimum vertical gap between labels
for i, cp in enumerate(result[:-1]): # exclude the last point (end of series)
date = series.index[cp]
value = series.iloc[cp]
date_key = date.to_period("M").to_timestamp()
headline = headline_map.get(date_key, "")
# Label format: "Feb 2009 – Peak of the Great Recession\n8.7%"
label = f"{date.strftime('%B %Y')}"
if headline:
label += f" – {headline}"
label += f"\n{value:.1f}%"
# For the first label, place it at the top of the chart
if i == 0:
y_pos = plt.ylim()[1] - 1 # 1 unit below the top
else:
y_pos = value + 0.3
for prev_x, prev_y in label_positions:
if abs((date - prev_x).days) < 20 and abs(y_pos - prev_y) < min_vgap:
y_pos = prev_y + min_vgap
label_positions.append((date, y_pos))
plt.axvline(date, color='red', linestyle='--', alpha=0.7)
plt.text(
date, y_pos,
label,
color='red', fontsize=10, rotation=0, va='bottom', ha='center', backgroundcolor='white'
)
plt.title('Change Point Detection: Overall Unemployment Rate with Policy Headlines')
plt.xlabel('Date')
plt.ylabel('Unemployment Rate (%)')
plt.legend()
plt.tight_layout()
plt.show()
Apply Isolation Forest to detect and plot anomalies in the unemployment rate time series.
Anomaly Detection (Isolation Forest):
Orange points show months where the unemployment rate is unusually high or low compared to the rest of the series.
This method highlights outliers, such as sudden spikes or drops, rather than persistent shifts in trend.
Change Point Detection:
Previously, red dashed lines indicated structural breaks—points where the statistical properties of the series change.
Comparison:
# Anomaly Detection using Isolation Forest
# Prepare the data for anomaly detection (use the same 'series' as for change point detection)
series_values = series.values.reshape(-1, 1)
# Fit Isolation Forest for anomaly detection
iso_forest = IsolationForest(contamination=0.05, random_state=42)
anomaly_labels = iso_forest.fit_predict(series_values)
# Anomalies are labeled as -1
anomaly_indices = np.where(anomaly_labels == -1)[0]
anomaly_dates = series.index[anomaly_indices]
anomaly_values = series.values[anomaly_indices]
# Plot the results
plt.figure(figsize=(14, 10))
plt.plot(series.index, series.values, label='Unemployment Rate', color='blue')
plt.scatter(anomaly_dates, anomaly_values, color='orange', label='Anomalies', zorder=5)
plt.title('Anomaly Detection: Overall Unemployment Rate (Isolation Forest)')
plt.xlabel('Date')
plt.ylabel('Unemployment Rate (%)')
plt.legend()
plt.tight_layout()
plt.show()
Anomaly Detection vs Change Point Detection
Isolation Forest is an unsupervised machine learning algorithm designed specifically for anomaly (outlier) detection in high-dimensional datasets. It works as follows:
Isolation Principle:
Anomalies are data points that are few and different. They are easier to "isolate" from the rest of the data compared to normal points.
How It Works:
Scoring:
The algorithm assigns an anomaly score to each point based on its average path length. Points with shorter path lengths (i.e., more easily isolated) are considered more anomalous.
Interpretation in the Chart:
In the unemployment rate time series, the orange points represent months where the unemployment rate is unusually high or low compared to the overall pattern. These are detected as anomalies by the Isolation Forest because they can be separated from the rest of the data with fewer splits.
Summary:
Isolation Forest is effective for time series and tabular data, especially when anomalies are rare and different from the majority of observations. It does not require labeled data and is robust to high-dimensional features.
Label anomalies and change points on the plots with dates and values.
# Plot anomaly detection with non-overlapping labels
plt.figure(figsize=(14, 10))
plt.plot(series.index, series.values, label='Unemployment Rate', color='blue')
plt.scatter(anomaly_dates, anomaly_values, color='orange', label='Anomalies', zorder=5)
# Place labels with vertical offset to avoid overlap
label_y_positions = []
min_vgap = 0.7 # Minimum vertical gap between labels
for date, value in zip(anomaly_dates, anomaly_values):
# Find a y position that does not overlap with previous labels
y_pos = value + 0.3
for prev_x, prev_y in label_y_positions:
if abs((date - prev_x).days) < 60 and abs(y_pos - prev_y) < min_vgap:
y_pos = prev_y + min_vgap
label_y_positions.append((date, y_pos))
plt.text(
date, y_pos,
f"{date.strftime('%Y-%m')}\n{value:.1f}%",
color='orange', fontsize=9, ha='center', va='bottom', backgroundcolor='white'
)
plt.title('Anomaly Detection: Overall Unemployment Rate (Isolation Forest)')
plt.xlabel('Date')
plt.ylabel('Unemployment Rate (%)')
plt.legend()
plt.tight_layout()
plt.show()
# Plot change point detection for comparison
plt.figure(figsize=(14, 10))
plt.plot(series.index, series.values, label='Unemployment Rate', color='blue')
for cp in result[:-1]: # exclude the last point (end of series)
date = series.index[cp]
value = series.iloc[cp]
plt.axvline(date, color='red', linestyle='--', alpha=0.7)
plt.text(
date, value + 0.3,
f"{date.strftime('%Y-%m')}\n{value:.1f}%",
color='red', fontsize=9, rotation=90, va='bottom', ha='center', backgroundcolor='white'
)
plt.title('Change Point Detection: Overall Unemployment Rate')
plt.xlabel('Date')
plt.ylabel('Unemployment Rate (%)')
plt.legend()
plt.tight_layout()
plt.show()
Differences:
Similarities:
Both charts are tools for analyzing the structure and behavior of the unemployment rate time series, but they focus on different types of events.
Overlay U.S. presidential terms and annotate change points with policy headlines on the unemployment rate chartfor more context.
# Read the presidents CSV file
presidents_df = pd.read_csv("../data/us_presidents_since_2004.csv")
presidents_df['date'] = pd.to_datetime(presidents_df['date'])
fig, ax = plt.subplots(figsize=(16, 10))
ax.plot(series.index, series.values, label='Unemployment Rate', color='blue')
# Overlay presidential terms as shaded regions
for i, row in presidents_df.iterrows():
start = row['date']
# Determine end of term: next president's start or end of series
if i + 1 < len(presidents_df):
end = presidents_df.loc[i + 1, 'date']
else:
end = series.index[-1]
color = 'gray' if row['political party'] == 'Republican' else 'lightblue'
ax.axvspan(start, end, color=color, alpha=0.18)
# Annotate president name above the plot, evenly spaced
mid = start + (end - start) / 2
ax.annotate(
row['president name'],
xy=(mid, 1.01), xycoords=('data', 'axes fraction'),
ha='center', va='bottom',
fontsize=13, fontweight='bold',
bbox=dict(facecolor='white', edgecolor='none', alpha=0.7, pad=2)
)
# Centered chart title
ax.set_title(
'Change Point Detection: Unemployment Rate with Policy Headlines and Presidential Terms',
fontsize=18, fontweight='bold', color='navy', pad=40, loc='center'
)
# Overlay change points with policy headlines (reuse label_positions from previous cell)
for (date, y_pos), cp in zip(label_positions, result[:-1]):
date_key = date.to_period("M").to_timestamp()
headline = headline_map.get(date_key, "")
label = f"{date.strftime('%B %Y')}"
if headline:
label += f" – {headline}"
label += f"\n{series.loc[date]:.1f}%"
ax.axvline(date, color='red', linestyle='--', alpha=0.7)
ax.text(
date, y_pos,
label,
color='red', fontsize=10, rotation=0, va='bottom', ha='center', backgroundcolor='white'
)
ax.set_xlabel('Date')
ax.set_ylabel('Unemployment Rate (%)')
ax.legend(['Unemployment Rate'])
plt.tight_layout()
plt.show()
The chart titled "Change Point Detection: Unemployment Rate with Policy Headlines and Presidential Terms" visually tracks the U.S. unemployment rate from 2004 to 2025 while highlighting key inflection points using change point detection methods. The blue line represents the monthly unemployment rate, and vertical red dashed lines indicate detected change points, each annotated with major economic or policy events that preceded or followed a noticeable shift in the unemployment trend. Overlaid backgrounds denote U.S. presidential terms: George W. Bush, Barack Obama, Donald Trump, and Joseph Biden. These temporal segments help contextualize how unemployment evolved under different administrations, offering a nuanced understanding of how fiscal and geopolitical events intersect with labor market performance.
Key turning points include February 2009, at the height of the Great Recession, where the unemployment rate surged following the 2008 financial crisis. Later markers, such as August 2011, align with the U.S. debt ceiling crisis and subsequent spending cuts, showing stagnation in job creation. A steady decline follows through the Obama administration, with another key point in February 2014 marking gradual recovery. The unemployment rate stabilizes during the Trump administration until May 2020, when the COVID-19 pandemic caused a sharp and unprecedented spike. August 2021 marks a slower recovery, as the Delta variant tempered economic reopening and job gains. The chart successfully connects unemployment shifts with policy and macroeconomic milestones.
This visual and statistical approach is extremely useful for analyzing historical economic data related to unemployment. It enables researchers, policymakers, and students to correlate sharp changes in labor market conditions with policy decisions, global events, or crises. By combining time series data with contextual annotations, the chart promotes a deeper understanding of economic causality and cyclical behavior. For future steps, integrating additional indicators—such as GDP growth, inflation, and labor force participation—could provide a more holistic analysis. Moreover, predictive modeling could be layered atop this framework to forecast how new policy decisions might influence future labor market outcomes.